Goto

Collaborating Authors

 multi-label contrastive predictive coding


Multi-label Contrastive Predictive Coding

Neural Information Processing Systems

Variational mutual information (MI) estimators are widely used in unsupervised representation learning methods such as contrastive predictive coding (CPC). A lower bound on MI can be obtained from a multi-class classification problem, where a critic attempts to distinguish a positive sample drawn from the underlying joint distribution from (m-1) negative samples drawn from a suitable proposal distribution. Using this approach, MI estimates are bounded above by \log m, and could thus severely underestimate unless m is very large. To overcome this limitation, we introduce a novel estimator based on a multi-label classification problem, where the critic needs to jointly identify \emph{multiple} positive samples at the same time. We show that using the same amount of negative samples, multi-label CPC is able to exceed the \log m bound, while still being a valid lower bound of mutual information. We demonstrate that the proposed approach is able to lead to better mutual information estimation, gain empirical improvements in unsupervised representation learning, and beat the current state-of-the-art in knowledge distillation over 10 out of 13 tasks.


Review for NeurIPS paper: Multi-label Contrastive Predictive Coding

Neural Information Processing Systems

Summary and Contributions: The authors propose a multi-label version of contrastive predictive coding (CPC), which essentially transforms the CPC loss from one that is defined on each positive pair and its set of negative pairs for i 1...n versus a version where all positive pairs (and all possible negatives amongst them) make up the softmax distribution, which can be thought of a'multi-label' classification task where the network is trained to ensure the top n predictions from that distribution are the n positive examples. The motivation behind this technique is that: - (1) The regular CPC loss (which is a lower bound on the mutual information between X and Y) is upper bounded by log(m), where m is the total number of pairs used (i.e. 1 positive pair m-1 negative pairs). If log(m) is much lower than I(X;Y) then we underestimate mutual information. Its disadvantage however is that for some range of alpha the loss will no longer be a lower bound to I(X;Y). In essence, the authors show that for their proposed method (alpha-ML-CPC), one can derive the range of alphas that lower bound I(X;Y) as a function of m and n, and these ranges are quite large even for modest values of m and n.


Multi-label Contrastive Predictive Coding

Neural Information Processing Systems

Variational mutual information (MI) estimators are widely used in unsupervised representation learning methods such as contrastive predictive coding (CPC). A lower bound on MI can be obtained from a multi-class classification problem, where a critic attempts to distinguish a positive sample drawn from the underlying joint distribution from (m-1) negative samples drawn from a suitable proposal distribution. Using this approach, MI estimates are bounded above by \log m, and could thus severely underestimate unless m is very large. To overcome this limitation, we introduce a novel estimator based on a multi-label classification problem, where the critic needs to jointly identify \emph{multiple} positive samples at the same time. We show that using the same amount of negative samples, multi-label CPC is able to exceed the \log m bound, while still being a valid lower bound of mutual information.